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Abstract —The amount of personal information contributed by 
individuals to digital repositories such as social network sites 
has grown substantially. The existence of this data offers un¬ 
precedented opportunities for data analytics research in various 
domains of societal importance including medicine and public 
policy. The results of these analyses can be considered a public 
good which benefits data contributors as well as individuals who 
are not making their data available. At the same time, the release 
of personal information carries perceived and actual privacy risks 
to the contributors. Our research addresses this problem area. 

In our work, we study a game-theoretic model in which 
individuals take control over participation in data analytics 
projects in two ways: 1) individuals can contribute data at 
a self-chosen level of precision, and 2) individuals can decide 
whether they want to contribute at all (or not). From the analyst’s 
perspective, we investigate to which degree the research analyst 
has flexibility to set requirements for data precision, so that 
individuals are still willing to contribute to the project, and the 
quality of the estimation improves. 

We study this tradeoff scenario for populations of homo¬ 
geneous and heterogeneous individuals, and determine Nash 
equilibria that reflect the optimal level of participation and 
precision of contributions. We further prove that the analyst can 
substantially increase the accuracy of the analysis by imposing a 
lower bound on the precision of the data that users can reveal. 

Index Terms —Non-cooperative game, public good, privacy, 
population estimate, data analytics, non-monetary incentives 

I. Introduction 


A. Background 

The seminal “How much Information Project?” report pub¬ 
lished in 2000 concluded that between 1 and 2 exabytes of 
unique information were produced worldwide per year which 
translated into about 250 megabytes of information for every 
human being 0, 0- While those figures were (and are still) 
largely driven by commercial production of information, in 
recent years the amount of personal information produced 
by individuals has grown substantially. Now, Facebook alone 
absorbs about 220 petabytes of user-contributed data each year 
0- Recognizing the opportunities to economically benefit 
from this growth, personal data has been heralded as the 
“New Oil” of the 21st Century 0. Similarly, opportunities are 
increasingly taken advantage of to utilize the data for research. 


From the individual’s perspective the latter trend results in a 
tradeoff calculus. 

On the one hand, individuals recognize that many complex 
challenges with societal importance, such as public health 
considerations, market-research or political decision-making 
0, may benefit from a more rigorous analytic treatment, 
thanks to data analytics research and the newly-won abundance 
of personal information. From this perspective, many analytic 
results that are based on individuals’ personal data can be 
interpreted as public goods with societal importance. For 
example, advancements to better understand certain illnesses 
do not only potentially benefit the contributors of personal 
data, but are often made accessible to people in a particular 
domain (e.g., citizen of a country, individuals in a certain 
social status or demographic category, or everybody). 

On the other hand, the same individuals have justified 
privacy concerns about the release of their personal data. The 
reasons for privacy concerns can be quite diverse as outlined 
in Solove’s privacy taxonomy |6|. Individuals may perceive 
the release and use of their data as an intrusion of their 
personal sphere ||7), J8), or as a violation of their dignity j9j. In 
addition, they may fear this data can be abused for unsolicited 
advertisements, or social and economic discrimination (e.g., 

G9- HUD- 


The published studies demonstrate the need to organize the 
collection of personal data when facing this users’ tradeoff 
scenario, by implementing effective control and participation 
mechanisms. It has been shown that a majority of individuals 
consider it as important to be able to exercise control over 
the release of their personal data G3- For example, a number 
of empirical studies have provided evidence for such desires 
for control in the medical domain m-m- Moreover, even 
if data privacy provisions are met, many respondents would 
still require notice and consent over their medical data release 
1141-| 16||. Finally, several studies show a high overall concern 
for certain data releases. For example, a meta-review of 
published surveys showed that in some contexts a majority of 
respondents were entirely uncomfortable with health research 
if effective notice and consent practices were absent 03- 
Similar findings can be shown for other problem domains. 








B. Problem Statement and Approach 

Our research addresses the problem area identified in the 
above section. In this paper, we propose individuals’ incentives 
to participate in data analysis projects. These individuals face a 
tradeoff between having privacy cost associated with their data 
release, but also deriving benefits from the analysis’ results. 

We are particularly motivated by the scenario when data 
about individuals is already stored in a secure database for a 
different primary purpose (e.g., social networking or medical 
services). An analyst can then request the participation of 
individuals in a data analysis project (via a notice and consent 
process with negligible cost) that provides a public good. More 
precisely, individuals make decisions about the release of a 
private value given a population-relevant metric. The analyst 
has the objective of accurately estimating the associated pop¬ 
ulation average for all individuals. 

Our main focus is on understanding the incentives of 
individuals to participate, and of the analyst to shape this 
decision-making process. From each individual’s perspective, 
control over participation takes two forms: 1) individuals can 
contribute data at a self-chosen level of precision, and 2) 
individuals can decide whether they want to contribute at 
all (or not). From the analyst’s perspective, we investigate 
to which degree the research analyst has flexibility to set 
requirements for data precision, so that individuals are still 
willing to contribute to the project, and the quality of the 
estimation improves. 

Our work assumes that incentives for participation are non¬ 
monetary; that is, the main driver for data contributions is the 
interest in the derived public good. We base this assumption 
on the observation that direct monetary compensation for 
personal information has so-far received very little traction 
in the market for personal information, and that it meets little 
acceptance in consumer surveys Q 

We follow a game-theoretic approach to investigate the 
outlined trade-off calculus. We iteratively develop a model, 
where the starting point is a simplified version of the work 
by 0, that captures the interaction between an analyst and 
a set of individuals who have control over the release of 
information to the analyst. We conduct a rigorous analysis and 
derive concrete results about the precision of contributions, the 
quality of the population estimate, and the overall willingness 
to contribute to the project. 

C. Contributions 

In this paper, we consider critical facets of realistic pri¬ 
vacy decision-making, striking a good balance between model 
complexity and potential impact. We rigorously analyze a 
general model where users optimize a cost composed of an 
individual privacy cost and an estimation cost that captures the 
public good component of the analyst’s estimation, both given 
by arbitrary functions satisfying relatively mild assumptions. 

'While related empirical data is sparse, a survey reported that only about 
25% of the surveyed population would accept monetary compensation for per¬ 
sonal information |12| . In contrast, offering discounts or free products/services 
for personal information is a common practice. 


In particular, we consider a general case with a continuous 
privacy cost function which allows users to choose a privacy 
level in a continuum of choices (and not simply a 0-1 choice). 
We first analyze the homogeneous agents case, and then 
we extend our results to the case of heterogeneous agents, 
providing in detail the actions the analyst should take in order 
to improve the estimation. Evidence that privacy concerns 
are heterogeneous is a particularly central cornerstone of the 
privacy literature ||T9), and such an extension is fundamental 
for the applicability of the model. 

For both the homogeneous and the heterogeneous case, we 
determine Nash equilibria indicating the number of contrib¬ 
utors and the optimal contribution levels by the individuals. 
We further prove that the analyst can increase the population 
estimate’s accuracy simply by imposing a lower bound on the 
precision of the data that users can reveal (i.e., by restricting 
the level of precision of data contributions). While, for a fixed 
population of users providing data, increasing the precision 
of each data point clearly improves the population estimate’s 
precision, the surprising and important aspect of our result lies 
in that the scheme remains incentive compatible, i.e., users 
are still willing to provide data with a higher precision rather 
than dropping out. We also show how to tune the minimum 
precision level the analyst should set in order to optimize the 
population estimate’s accuracy. In our numerical simulations, 
we find a maximum improvement of the population estimate’s 
accuracy in the order of 20 — 40%. 

We further provide extensions of our modeling framework. 
First, we discuss a two-stage game in which the analyst 
may first recruit participants that commit to provide private 
data with a minimum precision; and only in a second stage, 
these agents would be asked to disclose their information. 
This captures scenarios in which agents are recruited for 
specific studies. Second, we also address the issue of costly 
acquisition of agents and their data for analysis purposes. 
While the no-cost-per-agent assumption we make throughout 
the remainder of the paper is a standard approach in most of 
the literature on public goods, we believe that certain practical 
scenarios require the appreciation of cost considerations, and 
this extension further completes our framework. 

Our results provide a widely applicable method to increase 
the provision of a public good above voluntary contributions, 
simply by restricting the agents’ strategy spaces. This method 
is attractive by its simplicity compared for instance to other 
schemes that involve monetary transfers; and could find uti¬ 
lization in other public good contexts. 

Understanding the trade-off between privacy, the quality 
of data analysis results, and willingness-to-participate in such 
projects is of current and growing importance. Analysts should 
not rely on overly broad or ineffective (take-it-or-leave-it) 
notice and consent procedures that do not accurately reflect 
individuals’ preferences. In many privacy-sensitive scenarios 
such as involving medical data it is particularly unethical to 
deprive individuals of their opportunities to make decisions 
about their data, and whether they want to be involved in 
certain analysis projects. However, better insights about the 


involved incentive structures are needed to guide public policy 
and advancements of privacy-aware data analysis. 

Preliminary versions of some of the results presented in 


this paper appeared in our short paper [201, in the context of a 
simplified model with monomial privacy cost, linear estimation 
cost and homogeneous agents. Here, we provide results for 
the general framework introduced above that relaxes such as¬ 
sumptions, we provide detailed results of practical importance 
on how the analyst should optimally selected the minimum 
precision level, and we provide several further extensions. In 
Section |IV-C we also provide more detailed results in the 
simplified setting of {20) , to qualitatively illustrate the results 
of the present paper. 

D. Roadmap 

Our paper is structured as follows. In Section [II] we 
review related work. We develop and describe our model in 
Section III We conduct our analysis in detail in Section IV on 


1321. Our work differs, as we assume agents to be re¬ 


a canonical case of homogeneous agents. We extend the results 
to heterogeneous agents in Section [V] We discuss extensions 
to our model in Section [VI] and conclude in Section |VII| All 
proofs are relegated to the Appendix. 

II. Related Work 

Our model draws on different lines of research including 
work on privacy in the context of data analytics, and game 
theoretic and public goods models. We also briefly review 
technical and cryptographic approaches, and behavioral re¬ 
search on control and data sharing. 

Research on the optimal design of experiments assumes 
that already the stage of data collection can be influenced 
by the analyst in order to improve the learning of a linear 
model (21), (22). In this paper, we allow the analyst to require 
data contributions at a certain level of precision to improve 
the computation of a population estimate, which is a related 
concept. Optimal design of experiments has been studied 
from the perspective of incentives (23) , or with the scope of 
obtaining an unbiased estimator (24) . We propose to improve 
the design of experiments focusing on the privacy concerns of 
the agents. 

Privacy-preserving techniques in the context of data ana¬ 
lytics have a long history. Some recent papers propose new 
approaches, which allow users to protect their privacy selling 
aggregates of their data [25J, {261. The more classical frame¬ 
work of e-differential privacy (27), (28) , assumes that data are 
perturbed after an analysis has been conducted on unmodified 
inputs. That is, the analyst is considered trustworthy. In this 
framework, researchers have also studied the role of incentives 


leasing their data independently, and an untrusted data analyst 
which motivates perturbations of data before submission. The 
idea of affecting the level of precision of released personal 
data, adding noise in advance of data analysis has been studied 
in the context of privacy-preserving data-mining (see, e.g., 
[33), (34)) and specific application scenarios such as building 
decision trees 1351, clustering ]36|, and association rule mining 


137). More recently, bounds have been derived on generic 
information-theoretic quantities and statistical estimation rates 
under a local privacy model which preserves the privacy of 
agents even from the learner (similarly to adding noise before 
revealing data) [38]. 

Recent work has also studied the combinatorial optimiza¬ 
tion problem when an analyst may buy unbiased samples 
of data from different providers with given but potentially 
heterogeneous variance-price combinations (39) . In another 
recent working paper, analysts can access unbiased samples 
of private data by compensating data subjects for their data 
release according to their preferences [4011. Those studies are 
complementary to our work in which data subjects individually 
decide in a game-theoretic framework on the degree of data 
accuracy given a trade-off between their privacy and the 
determination of a socially valuable population estimate. 

From a mechanism design perspective, scenarios have been 
studied where survey subjects are assumed to potentially 
misreport their private values (4l) , (42) , however, these be¬ 
haviors are not studied in the context of a non-cooperative 
scenario. A mechanism design perspective is taken in [ [43) 
where the authors introduce monetary payments to create 
incentives for agents to give high quality data. Here, we 
do not consider monetary payments. A strategic approach is 
followed in GD> where an analyst performs a linear regression 
based on users’ perturbed data. The authors in (TB) treat the 
estimation accuracy as a public good and study the equilibrium 
accuracy achieved without introducing monetary payments and 
the resulting price of anarchy. Our starting point is a simplified 
version of the model in |[T8) . We continue this line of research 
by studying the benefits of restricting potential perturbation 
on the population estimate accuracy, and the incentives for 
participation in a game-theoretic framework. 

Our research is also relevant to the context of the provi¬ 
sioning of public goods (44) . Our results show a new way of 
increasing the public good provision by restricting the agents’ 
possible actions, as opposed to using monetary incentives. In 
addition, studies on interdependent privacy which capture the 
idea that data sharing by one agent impacts the privacy of other 
connected agents is complementary to our work (45) , (46) . 
We model the scenario when sharing creates privacy risks for 
individuals, but positive benefits for all agents. 

The aforementioned theoretical works are complemented by 
technical approaches (which do not utilize insights from game- 
theory) such as secure hardware-based private information 
retrieval which can be applied, for example, in the context 
of online behavioral advertisement (47); see also other ap¬ 
proaches for privacy-preserving online targeted advertisements 
(48). Similarly, multi-party secure computation has been used 
to facilitate the fitting of logistic regression when data are 
held by separate parties (49) , and homomorphic encryption has 
been applied to the scenario of linear regression (50) . Secure- 
computation notions of privacy have also been used in com¬ 
bination with game theory for privacy-preserving mechanism 
design 0 0- 

To facilitate the privacy negotiation process between a data 












subject and an analyst, different technical protocols have been 
proposed. Several works are connected to the Platform for Pri¬ 
vacy Preferences Project (P3P) which offers a protocol allow¬ 
ing data collectors (e.g., websites) to declare their intended use 
of information they collect about data subjects )53) , and also 
provides agent tools for the user to manage those data requests 
g (53). More recent work, for example, addresses specific 
problem areas such as personalization {56) . Those mechanisms 
allow for user-specified policies regarding participation, but 
also minimum requirements for (not necessarily truthful) data 
sharing as specified by the analyst. 

Research on user preferences and behaviors with respect to 
privacy has produced several results relevant to the context of 
our work. A survey study has shown that over 90% of the 
respondents agreed with the definition of privacy as control 
of personal information {12) which presumably would include 
an interest to decide over the participation in data analysis 
projects. In hypothetical scenarios, individuals typically report 
high attitudinal valuations for their private data (57) . However, 
in experiments with actual private data transfers researchers 
have observed low thresholds for the release of such data in 
exchange for free services/goods or discounts |19|, (58) , {59) . 
A root cause for this privacy dichotomy is the complexity 
of understanding personal information exchanges and their 
consequences m 

The intricacies of human decision making have also been 
studied specifically focusing on the notion of control over 
information exchanges. Laboratory and online experiments 
have shown that control options have to be added with care 
to practically relevant scenarios [j60)-{62). For example, such 
options can elevate individuals’ propensity to engage in riskier 
disclosures because their mere presence can contribute to a 
lowering of concerns over privacy ]60) . Another experimental 
study found that allowing individuals to customize personal 
data exchanges does not increase the number of transactions 
even though individuals were able to exclude unwanted aspects 
of those transactions [63) . Overall the understanding of the in¬ 
volved attitudes and behaviors is still work in progress. In our 
paper, we propose a process that is relatively straightforward 
to implement and to understand from a user perspective. How¬ 
ever, approaches that fully accommodate the stated behavioral 
hurdles remain the subject of future work for behavioral as 
well as theoretical scientists. 


III. The Model 

In this section, we present our model in detail. We describe 
the strategic interaction between the individuals (which we 
also refer to as agents), whose information is contained in 
a data repository, and how the analyst, wishing to observe 
the data and to perform a statistical analysis, may modify the 
estimation by varying selected parameters. The linear model 
approach we take here builds on the work of {18| . 

A. The Data Repository of Personal Data 

Let N = {l,...,n} denote the set of agents, whose 
personal data are contained in the data repository. In particular. 


we suppose that each agent i £ N is associated with a 
private variable y-i £ R, which contains sensitive information. 
Throughout our analysis, we suppose that there exists i/m £ R, 
s.t., the private variables are of the form 

Vi = Dm + V* £ N, (1) 


where e, are i.i.d., zero-mean random variables with finite 
variance a 2 < oo, which capture the inherent noise. We 
stress that we make no further assumptions on the noise; in 
particular, we do not assume it is Gaussian. As a result, our 
model applies to a wide range of statistical inference problems, 
even cases where the distribution of variables is not known. 

Paramter i/m represents the mean of the private variables 
yi, and its knowledge is valuable to the analyst, for example 
as it allows him to predict the private variable of any agent 
whose data cannot be known (because it is not contained in the 
repository at that given moment, kept private by its owner, is 
not accessible due to limited computing resources, etc.). The 
analyst wishes to observe the available private variables y, and 
to compute their average as an estimation of i/m . In our model, 
we suppose that the analyst does not know the mean i/m, that 
he wishes to estimate, but he knows the variance er 2 . We argue 
that observing the variability of an attribute in a population is 
easier than estimating the mean, both for the analyst and for 
the population (in [641, for example, the authors show how 
individuals value their age and weight information according 
to the relative variability). 


B. The Precision and the Analyst’s Estimation 

We suppose that the analyst cannot directly access the 
private variables, rather she needs to ask the agents for their 
consent to be able to retrieve the information. As such, the 
agents have full control over their own private variables, and 
they have the choice to authorize or to deny the analyst’s 
request. In particular, if wishing to contribute, but concerned 
about privacy, an agent can authorize the access to a perturbed 
value of the private variable. The perturbed variable has 
the form jji = yi + Zi, where Zi is a zero-mean random 
variable with variance a 2 . We assume that the {zi}i^N are 
independent and are also independent of the inherent noise 
variables {e,}igAr. In practice, the agent chooses a given 
precision A, which corresponds to the inverse of the aggregate 
variance (inherent noise, plus artificially added noise) of the 
perturbed variable y, : , i.e., 

Xi = l/(cr 2 + of) £ [0, 1/cr 2 ], Vi£N. 

In the choice of the precision level, we have the following two 
extreme cases: 

(i) when A, = 0, agent i has very high privacy concerns. 
This corresponds to adding noise of infinite variance or, 
equivalently, this represents the fact that agent i denies 
the access to her data; 

(ii) when A» = 1/u 2 , agent i has very low privacy concerns. 
This corresponds to authorizing the access to the real 
private variable yi, without adding any additional noise 
to the data. 




The strategy set [0,1/cr 2 ] contains all the possible choices 
for agent i: denying, authorizing, or any intermediate level of 
precision (which captures a wide range of privacy concerns 
as documented in behavioral studies |19|). We denote by A = 
[AiJigjv the vector of the precisions. 

Once each agent i £ N has made her choice about the 
level of precision A t and, consequently, the perturbed variable 
jji has been computed, the analyst has access to both the set of 
precisions and the set of perturbed variables. Then, the analyst 
estimates the mean as 


Vm( A) 


SigiV ^iVi 


( 2 ) 


where perturbed variables with higher precision (i.e., smaller 
variance) receive a larger weight. This estimator is the standard 
generalized least squares estimator. It minimizes a weighted 
square error in which the i-th term is weighted by the precision 
of the perturbed variable y,. This estimator is unbiased, i.e., 
E[yjvf] = Dm, and has variance 

<4f (A) = E[(y M (A) - y M ) 2 } = ^-t- G [cr 2 /n, +oo], 

(3) 


In our model, the analyst aims at estimating the mean ijm, 
e.g., to be able to predict some additional private variables. 
Then, it is reasonable to assume that the analyst would use this 
estimator, as it is “good” for several reasons. In particular, it 
coincides with the maximum-likelihood estimator for Gaussian 
noise and, most importantly, it has minimal variance amongst 
the linear unbiased estimators for arbitrary noise distributions. 

In the estimation, we have the following two extreme cases: 

(i) when A, = 0 for each i £ N, the variance (|3]i is infinite. 
This corresponds to the situation in which each agent 
denies the access to her data, and then the analyst cannot 
estimate yM', 

(ii) when Aj = 1/cr 2 for each i £ N, the analyst estimates 
yM with variance o 2 /n, resulting only from the inherent 
noise. This corresponds to the situation in which each 
agent is authorizing the access to her data with maximum 
precision, i.e., no agent is perturbing her private variable. 

For any level of precision in [0,1/cr 2 ] 71 , the estimated variance 
will be in [er 2 /n, +oo]. The set of precision vectors for which 
the estimator has a finite variance is [ 0 , l/cr 2 ] ra \ {( 0 ,..., 0 )}. 


C. The Estimation Game T 

We next describe the interaction between the agents that 
results in their choices of precisions. We assume that each 
agent i £ N wishes to minimize a cost function Ji : 
[0,1/cr 2 ] 71 —> R + , s.t., for each A £ [0,1/ct 2 ]", 

J*(A, A_j) = Cj(Ai) + /(A), (4) 


where we use the standard notation A , to denote the col¬ 
lection of actions of all agents but i. The cost function Ji of 
agent i £ N comprises two non-negative components. The 
first component cy : [ 0 , 1 /cr 2 ] —► K + represents the privacy 
attitude of agent i, and we refer to it as the privacy cost: 


it is the (perceived or actual) cost that the individual incurs 
on account of the privacy violation sustained by revealing the 
private variable perturbed with a given precision. The second 
component / : [0, 1 /cr 2 ] 71 —► R + is the estimation cost, and 
we assume that it takes the form /(A) = F{cr\ I ( A)) where 
F : [cr 2 /n,+oo) — > R + if the variance is finite, and +oo 
otherwise. It represents how well the analyst can estimate the 
mean yM and it captures the idea that it is not only in the 
interest of the analyst, but also of the agents, that the analyst 
can determine an accurate estimate of the population average 
Dm- 

In our model, the accuracy of the estimate can be understood 
as a public good, to which each user contributes with her 
choice of precision A,, at a given privacy cost. From this 
perspective, the assumption that the estimation cost is the same 
for all agents mirrors the usual standard assumption in the 
public good literature. Throughout our analysis, we make two 
additional assumptions: 

Assumption 1: The privacy costs cy : [0,1/cr 2 ] —► R + , i £ 
N, are twice continuously differentiable, non-negative, non¬ 
decreasing, strictly convex and s.t. c/ 0 ) = c'( 0 ) = 0 . 

Assumption 2: Function F : [cr 2 /n,+ 00 ) —► R + is twice 
continuously differentiable, non-negative, non-decreasing and 
strictly convex. 

To describe the strategic interaction between the agents, we 
define the estimation game T = (IV, [0,1/cr 2 ] 71 , (J/iejv) with 
set of agents N, strategy space [0,1/cr 2 ] for each agent i £ N 
and cost function ./, given by Q. 


D. The Modified Estimation Game T(S,r]) 

As we shall see (Section IV-A[ ), game T has a unique Nash 
equilibrium for which the variance of the estimation is larger 
than the optimal one (cr 2 /n) due to the excess noise added 
by agents to protect their privacy. We further investigate the 
situation in which the analyst can modify the game and try 
to mitigate the effect of agents’ privacy concerns in order to 
reduce the estimation cost (i.e., to improve the accuracy of the 
estimation obtained). Specifically, the analyst can implement 
the following two variations of the model. First, she can choose 
a minimum precision level rj £ [ 0 , 1 /cr 2 ], which is equivalent 
to fixing a maximum variance for the noise that agents can 
add to perturb their data. As it is not practically possible to 
force agents to authorize the access to their data with a given 
precision, we still assume that the agents can choose to deny 
the authorization, which is equivalent to selecting a precision 
level equal to zero. Second, the analyst can request the access 
to the personal data to only a subset S C TV of agents, with 
s = 1 5 1 (for example, excluding those agents who are the most 
concerned about privacy). 

In the modified game, the agents are informed of the subset 
of individuals who are asked to reveal their personal data, 
and of the minimum precision level rj. They choose their 
precision A i in the range imposed by the analyst [ 77,1 /cr 2 ] 
or decide to deny the access, i.e., select their precision 
equal to 0. To analyze the strategic interaction between the 
agents in this variation, we define the game T(S,rf) = 







(5, [{0} U [ 77 ,1/cr 2 ]] s , (Ji)»es) (where the cost function J* 
is still given by (|4|), which is identical to V, except for the 
restricted set of agents and the restricted strategy space. 

Observe that the original game F is a special case of this 
modified game r(5 l , 77 ), when S - N and 77 = 0. We analyze 
the games T and r(5 l , 77 ) as complete information games 
between the agents, i.e., we assume that the set of agents, 
the action sets (in particular, when present, the value of the 
parameter 77 ) and the costs are known by all the agents. 

IV. The Homogeneous Agent Case 

In this section, we detail the analysis in the symmetric case 
where all the agents have identical privacy concerns, i.e., we 
assume that the privacy cost functions of all agents are the 
same: cf ) = c(-) for each i £ N. This special case highlights 
the key aspects of our approach and provides some interesting 
preliminary results that yield intuitive interpretations. We will 
generalize our results to the heterogeneous case in Section [V] 

A. The Estimation Game in the Homogeneous Case 

We first analyze the estimation game T, in which all the 
agents in N are playing and the analyst allows them to 
choose any precision level between 0 and 1/cr 2 . A Nash 
equilibrium (in pure strategy) of this game is a strategy profile 
A* £ [0, l/a 2 ] ra satisfying 

A* € argmin J^A^AlJ, \/i £ N. (5) 

A*e[0,1/a 2 ] 

The game T with strategy space [0,1/cr 2 ] is a special case 
of the game in jTSJ, where the existence of a unique Nash 
equilibrium is established. However, our specific assumptions 
allow us to characterize the equilibrium in more detail: 

Theorem 1: The game F has a unique Nash equilibrium A* 
s.t. A* = A* > 0 for each * £ N. 

The proof of this result exploits the fact that game T 
is a potential game to characterize the Nash equilibrium. 
Interestingly, we observe that non-participation by everybody, 
i.e., A = (0,..., 0), cannot be an equilibrium. Indeed, as the 
estimation cost diverges at A = (0,..., 0), every agent has 
a profitable deviation from this point since contributing any 
positive Xi brings the estimation cost down to a finite cost. 
Note, however, that this is not an artifact of the model, as it 
remains true if we assume that the estimation cost is bounded 
but large enough to exceed the privacy cost. 

We observe that, as a consequence of the symmetry of 
the game in the homogeneous case, all the agents at equi¬ 
librium choose the same precision level, which is a function 
A* = A *(n) of the total number of agents n. Then, from the 
discussion above, it is clear that A* cannot be zero, so that all 
agents contribute a positive precision. 

Due to the arbitrariness of the functions F(-) and c(-), the 
unique Nash equilibrium cannot be written in closed form. 
However, it is easily computable in practice either as the 
minimum of the potential function (which is convex) or as 
the unique solution of the following fixed point problem: 

A = g(n, A), 


where function g : N* x [0,1/cr 2 ] —t [0,+ 00 ] is defined for 
each A £ (0,1/cr 2 ] and for each n £ N* as 


g(n , A) = min 



n 2 d (A) 


,1/a 2 


and is defined by continuity as lim A _>,o+ g(n, A) for A = 0 and 
for each n £ N*. 

Given the unique Nash equilibrium A *{n), the variance (in 
Equation Q) of the estimate of ijm obtained by the analyst at 
equilibrium is also a function of n, and given by the following 
expression: 


a *' <A ‘ (n)) = (6) 

In Propositions [T] and [2] below, we derive the properties of the 
equilibrium precision and of the corresponding variance, when 
the number of agents varies. 

Proposition 1: The equilibrium precision level A*(n) satis¬ 
fies: 


(i) A *(n) is a non-increasing function of the number n of 
agents, and 

(ii) lim n _j. +00 A*(n) = 0. 

Proposition[T|states that the equilibrium contribution of each 
agent decreases as the number of agents increases (Part (i)). 
This is a standard property in public good problems as agents 
choose their equilibrium contribution such that the marginal 
increase in the contribution cost equates the marginal decrease 
in the estimation cost, and the marginal effect of a single agent 
decreases when the number of agent increases. Proposition [TJ 
(ii) shows that, in the limit when n becomes very large, the 
contribution of each agents tends to zero (i.e., each agent adds 
a variance tending to infinity to her data). It is interesting to 
notice that, given that the equilibrium prevision level A*(n) 
goes to zero as n goes to infinity, the variance © cannot 
decrease in 1/n as in the standard case of the empirical mean 
of iid random variables of equal variance. This is because, 
here, the variance of each data point (or random variable) 
increases as the number of points increases. Yet, as the next 
proposition shows, the variance of the mean’s estimate is still 
non-increasing. 

Proposition 2: The equilibrium variance of the estimate of 
Um satisfies: 

(i) cr^(A*(n)) is a non-increasing function of the number 
of agents n, and 

(ii) lim n _j. +00 a' 2 M (X*(n)) = 0. 

Proposition [2] \(i) shows that, for the analyst, it is always 
better to have a larger number of agents giving data despite 
the fact that, when the number of agents increases, each 
agent gives data with smaller precision (see Proposition [TJ. 
Proposition j2j-(») analyzes the case of a large number of agents 
n. Interestingly, when n gets large, the variance goes to zero, 
though at a rate smaller than 1/n as mentioned above. (We 
give an expression of the rate in Section IV-C for special 
functions F and c). 






B. The Modified Estimation in the Homogeneous Case 

We now move to the case where the analyst can restrict 
the set of agents, thereby asking to access the data of only 
a subgroup of them, and potentially introducing a minimum 
precision level 77 £ [ 0 , 1 /cr 2 ]. The final goal is to improve 
the estimation accuracy; formally, to estimate the mean t/m 
with a variance strictly smaller than (n)). We assume 

that the set S C N of agents who can authorize access to 
their data (i.e., who are solicited by the analyst) is fixed, and 
we analyze how the estimation varies while moving only the 
parameter 77. This variant is modeled by the game I’(.S', 77) 
defined in Section |IlI-D[ where 77 is now the only variable of 
the model. We suppose that the equilibrium precision level for 
the game T(S', 0 ) is s.t. A*(s) 7S 1 /ct 2 since, otherwise, the 
estimation would already be optimal with variance a 2 /s for 
77 = 0. 

A Nash equilibrium (in pure strategy) of the game T(S,rj) 
is a strategy profile A* £ [{ 0 } U [77 ,1 /cr 2 ]] satisfying 

A* £ argmin J^Aj, Al,), Vz £ S'. ( 7 ) 

Ai£ {0}U [t7,1/<t 2 ] 

In the following theorem, we show that, if the analyst chooses 
a minimum precision level that is not “too big”, the agents are 
still wishing to authorize access to their data at equilibrium. 
Recall that S C N denotes the set of agents solicited by the 
analyst (who are the players of the game T(S, 77)) and that 
s = (S'! denotes its cardinal. 

Theorem 2: If s = 1 , then for any 77 £ [ 0 , 1 /cr 2 ], T(S, 77) 
has a unique Nash equilibrium \*(s,rj) = max {A* ( 1 ), 77}. 

If s > 1 , then there exists a unique parameter 77* (s) £ 
[0,1/cr 2 ] s.t.: 

(i) for any 77 £ [ 0 ,77* (s)], T(S, 77) has a unique Nash 
equilibrium \*{s,rj), s.t., A*(s,rj) = A* (.8,77) for each 
i £ S, with 


A*(s, 77) 


A*(s) if 0 < 77 < A*(s) 

77 if A*(s) < 77 < 77*(s); 


( 8 ) 


(ii) for any 77 £ (77* (s), 1 /cr 2 ], there does not exist a Nash 
equilibrium X*{s,rj) s.t. A*(s,?7) 7^ 0 for each i £ S'. 

Theorem [ 2 ] introduces the quantity 77* (s) which, as we will 
see, is crucial for the analyst. Similarly to A*(s), the value of 
77* (s) cannot be written in closed form, but it can be computed 
as the unique solution of the following fixed point problem: 


v = g(s,v), 


where function g : N* x [ 0 , 1 /cr 2 ] — t [ 0 ,+00] is defined for 
each 77 £ ( 0 , 1 /cr 2 ] and for each n £ N* as 


g(s,r]) = min 



and is defined by continuity as lim 1J _ >0 + <7(5,77) in 77 = 0 for 
each n £ N*. We can also show that A*(s) < rj* (s) for all s 
(we obtain this result inside the proof of Theorem [ 3 }. 


Theorem [ 2 ] characterizes the Nash equilibrium for different 
values of the parameter 77. We observe that, as a consequence 
of the symmetry of the game, when 77 £ [0,77* (s)], the unique 
equilibrium of T(S', 77) is still symmetric, as it was for the 
unique equilibrium of the original game T. More specifically, 
if the analyst sets a minimum precision level 77 smaller than 
the unique equilibrium precision level A*(s) of game T, the 
restriction of the strategy set does not have any effect on the 
outcome of the game. On the other hand, if the analyst sets 
a minimum precision level 77 in the interval (A*(s),77*(s)], 
all agents are still willing to participate with a precision 77 > 
A*(s). This result matches with intuition, because even though 
agents’ marginal costs are higher than the marginal benefits 
(the equilibrium choice is on the border of the strategy space 
[77,1/cr 2 ]), their costs are still lower than if they choose a 
precision level zero. Therefore, agents do not have incentives 
to deviate. In the remaining range (77*(s), 1 /cr 2 ], there does 
not exist an equilibrium such that each agent chooses a non¬ 
zero precision level. If there exist Nash equilibria, they are 
such that a subset S' C S of agents choose the non-zero 
precision level A*(s', 77), while the others choose zero. The 
possible existence of these equilibria is not relevant for our 
analysis. In fact, such an equilibrium would provide the same 
estimation that the analyst can obtain by implementing the 
game T(S',ri) and, as we see in the following theorem, the 
estimation improves by maximizing the number of agents in 
the game. 

The previous theorem is an important stepping stone allow¬ 
ing us to establish the main result of this section: 

Theorem 3: The estimation variance at equilibrium is min¬ 
imal for S = N and 77 = 77* (77.). Moreover, we have 

< cr^f (A* (77.)), 

that is, setting a minimum precision level 77 = 77* (n) strictly 
improves the estimation. 

Theorem [ 3 ] shows that the analyst can indeed improve the 
quality of the estimation by setting a minimum precision 
level. It establishes that it is optimal, for the analyst, to 
solicit access to the private variable of all the agents whose 
data is contained in the data repository; and it provides the 
optimal minimum precision level 77 = 77* (n) that the analyst 
should set to maximize the estimation precision. (Recall that 
77* (77.) can be easily computed from the model’s parameters by 
solving a fixed point problem.) Overall, Theorem [ 3 ] provides 
an implementable mechanism through which the analyst can 
improve the quality of the data provided by each user by 
imposing restrictions on the variance that users can add. In 
the next section, we study a special case with simple functions 
F(-) and c(-) in order to quantify precisely the improvement 
achieved. 

C. The Special Case with Monomial Privacy Costs and Linear 
Estimation Cost 

In this section, we illustrate the results of the previous 
sections on the special case where the privacy cost is monomial 






and the estimation cost is linear; i.e., we assume that the cost 
function in (|4| has the form 

A-t) = cA/ + <T^f (A), (9) 

where c € (0, oo) and k > 2 are constants. Note that, 
without loss of generality, in the linear estimation cost, we 
omit the constant factor (adding a constant to the cost does 
not modify the game solutions) as well as the slope factor 
(adding it would give an equivalent game with constant c 
rescaled). For this special case, we can determine both the 
equilibrium precision (without a minimum precision level) and 
the optimal minimum precision level in closed form. We can 
then graphically depict how the quantities vary while moving 
the model parameters, and explicitly compute the estimation 
improvement. A preliminary analysis of the simplified model 
with costs as in (|9| was provided in our previous work J20); we 
provide an extended analysis of this special case here thanks 
to the results of the previous section. 

In the special case of costs given by 0, the equilibrium 
precision chosen by the agents in the game F simplifies to: 




k k 

Fig. 1: Asymptotic improvement of the estimation choosing 
the optimum precision level rj* for values of k = 2 ,..., 10 
and for values of k = 2,..., 500. 




while the variance at equilibrium level X*(n, rf(rij) of game 
r (N,r]*(n))) where the optimal minimum precision level is 
set is given by 


o- 2 M {\*(n,rf{n))) 



A » 



1/a 2 


if 


if 



< 1/a 2 



> 1/a 2 . 


( 10 ) 


As we have seen in the previous section (Theorem |3j, it is 
optimal for the analyst to request access to the data of all 
agents in N. In this special case, the corresponding optimal 
minimum precision level becomes 


if(n) 


( 1 

\cn(n — 1) 

1/a 2 


1 

fc +1 


if 


if 


cn(n — 1) 

1 

:n(n — 1) 


< 1/a 2 


> 1/a 2 . 


- fc +1 

Both appear to have the same rate of decrease in n ' 1 which 
is smaller than n~ 1 but becomes closer to n -1 as k tends 
to infinity. Intuitively, as the privacy cost becomes closer to 
a step function, the equilibrium precision level becomes less 
dependent on the number of agents so that we get closer to 
the case of averaging iid random variables of fixed variance. 
Consequently, for n large enough, the improvement is given 
by a factor: 


ajf(A*(n)) = / kn \ fc+1 

^irfA/n, il*(n))) \n — 1J 


which asymptotically becomes constant: 


°m(A* (n)) 
CT ir(A*(n,t?*(n))) 


^n—too 


k k + 1 . 


( 11 ) 


( 12 ) 


Writing explicitly these two key quantities, we can imme¬ 
diately notice that, when c increases, i.e., when the agents are 
more concerned about privacy, they choose at equilibrium a 
smaller precision level A *(n). Further, the minimum precision 
level rf{n) proposed by the analyst becomes smaller, if the 
agents are more sensitive about the protection of their data. In 
this special case, the properties of the results for the generic 
case are easy to spot. For instance, we have A *(n) < rf{n) 
for each n £ N*, and both of these quantities decrease and go 
to zero when n increases and goes to +oo. 

Most interestingly, the closed-form expressions that we have 
for this special case allow us to analyze the rate of decrease 
of the variance, and to quantify the improvement that can be 
achieved by imposing a minimum precision level. For n large 
enough (such that both A*(n) and 77* (n) are strictly smaller 
than 1/a 2 ), the variance at equilibrium level A *(n) of game 
r is given by 




1 



Interestingly, we notice that this ratio of variances (charac¬ 
terizing the improvement when setting the optimal minimum 
precision level) depends on k , but not on c. (This holds even 
before the asymptotic regime, as long as n is large enough such 
that both A *(n) and r/*(n) are strictly smaller than 1/a 2 .) 

Figure [T| illustrates the asymptotic improvement ratio © 
for different values of k. We observe that it is bounded, it 
goes to 1 for large fc’s and it is in the range of 25 — 30% 
improvement for values of k around 2 — 10. Given that the 
ratio ( fTTfr converges towards its asymptote from above, this 
asymptotic improvement represents a lower bound of the 
improvement the analyst can achieve by implementing our 
mechanism with any finite number of agents n. 

V. The Heterogeneous Agent Case 

The previous section presents an exhaustive analysis of our 
model in the homogeneous case, i.e., when the agents exhibit 
the same privacy concerns. This simplified approach enables us 
to derive a first set of concrete results, intuition and qualitative 
understanding of the model and of the minimum contribution 


















level mechanism. The results directly apply to homogeneous 
populations, and can serve as a first approximation by the 
analyst in other cases, i.e., whenever she does not have specific 
information about the agents. Indeed, the results are functions 
only of the total number of agents, and in practice this could 
represent the only available detail about the agents whose data 
is stored in the data repository. However, not all populations 
are homogeneous in their privacy concerns and having more 
details about the different privacy concerns of the agents 
allows for a customized analysis. Measuring how individuals 
value their private information is non-trivial, but researchers 
have conducted direct measurement surveys (57), (65) and 
various laboratory/field experiments (58) , (59) allowing for an 
approximate ranking of users’ privacy concerns, and context- 
specific valuations. 

With this scope, we now extend our approach to the case 
in which the analyst faces a heterogeneous population. In this 
section, we remove the restricting hypothesis of homogeneity 
of the agents, and we allow them to exhibit different privacy 
concerns. Formally, the privacy cost function of an agent i £ 
N is equal to Ci(-), where all the c,’s satisfy Assumption [I] 
but may be different from each other. 

In order to model this situation, we follow the same 
approach that we used for the homogeneous case, i.e., we 
first analyze the situation in which the analyst implements 
the game F, without restricting the set of agents and without 
introducing a minimum precision level. Thereafter, we show 
how the analyst can improve the estimation by implementing 
a modified game T(S, rf). 

A. The Estimation Game in the Heterogeneous Case 

We start by analyzing the game T where each agent’s action 
set is [0,1/cr 2 ]. As for the homogeneous case, also in the 
heterogeneous case we know that the equilibrium of the game 
r exists and is unique because we are considering a special 
case of the game in (T8). However, we can now characterize 
the equilibrium in more detail. The first result of the section, 
is presented in the following theorem. 

Theorem 4: Assume that the privacy costs satisfy c^A) < 
• • • < c' n ( A), for all A £ [0,1/cr 2 ]. Then, game T has a unique 
Nash equilibrium A* s.t., 0 < A* < • • • < AJ. 

Theorem [4] assumes that the agents can be ordered in such 
a way that, for any precision level A £ [0,1/cr 2 ], an agent 
choosing precision A has higher marginal privacy cost (and 
hence higher privacy cost since Ci( 0 ) = 0 for all agents) than 
the previous agents if they choose the same precision level. 
This may require some re-ordering from the initial ordering, 
which comes without loss of generality. We believe that this 
assumption will often be reasonable in practice since agents 
who are more reluctant to increase the precision of their 
revealed data from a small precision (i.e., have higher marginal 
privacy cost for a small A) will likely be more reluctant to 
increase the precision of their revealed data from a large 
precision (i.e., have higher marginal cost for a large A too). 

The proof of Theorem [4] exploits the potential nature of the 
game to characterize the Nash equilibrium. The unique Nash 


equilibrium, which cannot be written in closed form, can be 
easily computed as the minimum of the (convex) potential 
function of the game T, which is the function : [ 0 , l/cr 2 ] n —► 
R+, s.t., for each A £ [0, l/cr 2 ] ra , 

<KA) = £ c i(Ai) + /(A)- (13) 

jeN 

We observe that, in the heterogeneous case, due to the 
asymmetry of the model, we no longer have a symmetric 
equilibrium. Moreover, the equilibrium strategy cannot be 
written as a function of the total number of agents n, as 
it depends on their privacy cost functions. We will use the 
notation A* = A *(N) to denote that the equilibrium depends 
on the specific identity of the agents in the set of agents 
N. As expected, at equilibrium, agents with higher privacy 
concerns select lower precisions and, as for the homogeneous 
case, no agent decides to deny the access to her data. The 
fact that every agent contributes positively at Nash equilibrium 
stems from our assumption that giving a small amount of data 
implies very little cost since the marginal cost at zero is zero 
(c'(0) = 0). (Note, though, that some agents may contribute 
arbitrarily close to zero.) This assumption, although realistic, is 
not strictly necessary; but it greatly simplifies the presentation 
of our model and results. 

Given the unique Nash equilibrium A* (TV), the variance (3) 
of the estimate of yu obtained by the analyst at equilibrium 
is given by the following expression: 

Even if the equilibrium precisions chosen by the agents (and 
the corresponding variances) are not functions of only n, we 
can still generalize Propositions [l] and [2] to the heterogeneous 
case. In Propositions [3] and [4] we analyze how the equilibrium 
precision and the variance of the estimate at equilibrium vary 
when a new additional agent enters the game. Note that the 
following two propositions do not use the ordering assumption 
of Theorem 0] 

Proposition 3: Given the game F, suppose that an ad¬ 
ditional (n + l)-th agent enters the game, and denote by 
A *(N U {n + 1}) the new equilibrium precision level. Then, 
for each i £ N, A *{N U {n + 1}) < A *{N). 

Proposition[3]states that the equilibrium contribution of each 
agent decreases, as soon as a new agent enters the game. 

Proposition 4: Given the game F, suppose that an addi¬ 
tional ( n + l)-th agent enters the game. Then, a] l/[ (\*(N U 
{n + l}))<^(A*(JV)). 

Proposition [4] shows that, for the analyst, it is always better 
to let new agents enter the game despite the fact that, doing 
so, each other agent is giving data with a lower precision. 
Surprisingly, this is true even if the agent who enters has higher 
privacy concerns than any other agent in the game, and then 
would accordingly contribute the lowest quality data. 

B. The Modified Estimation in the Heterogeneous Case 

We now move to the case where the analyst can restrict 
the set of agents by introducing a minimum precision level 



?7 £ [ 0 , l/(j 2 ]. Again, her final goal is to improve the 

estimation accuracy. We consider at first the set of agents 
S' C TV to be fixed, and we analyze how the estimation varies 
while moving only the parameter 77. This variant is modeled 
by the game T(S, 77 ) defined in Section III-D| where 77 is 
now the only variable of the model. We denote by A* (S') 
the equilibrium precision level for the game T(S, 0 ), and we 
suppose that it is such that there exists at least one agent i £ S 
s.t. A*(S) 7^ 1 /ct 2 ; otherwise the estimation is already optimal 
with variance a 2 /s for 77 = 0. 

The next result extends Theorem [ 2 ] to the heterogeneous 
case. We show that, if the analyst selects a minimum precision 
level which is not “too high”, at equilibrium, all the agents 
(even the most concerned about privacy) are still willing to 
authorize access to their data (with perturbation). 

Theorem 5: As in Theorem [ 4 ] assume that the privacy costs 
satisfy c / 1 (A) < ••• < c' ( (A), for each A £ [ 0 , 1 /tr 2 ]. Given 
the set of agents S C TV, with cardinality s > 1 : 

(i) if s = 1 , then for any 77 £ [0,1 /cr 2 ], T(S, 77 ) has a unique 
Nash equilibrium A* (S', 77) = max {A* (S), 77}; 

(ii) if s > 1 , then there exists a parameter 77 * (S) £ 
(A*(S), 1 /cr 2 ] such that, for any 77 £ [ 0 , 77 *(S)J, T(S,? 7 ) 
has a unique Nash equilibrium A*(S, 77 ) with A*(S, 77 ) > 
0 for all i £ S. 

Theorem [ 5 ] introduces a parameter 77 * (S) such that if the 
analyst sets a minimum precision level in [ 0 ,77*(S)], even the 
most privacy-concerned of the agents in S does not have an 
incentive to deviate to a zero precision level. As the theorem 
is stated, 77*(S) is not unique (any value smaller than a valid 
? 7 *(S) but still larger than A*(S) will be suitable). However, 
let 77 * (S) be s.t. 


c n(Ki(S,r]*(S))) = 

F ^ 1 


(15) 


J2j£N,j^n V*(S)) 


- F 


£ jeJV A*(S,77*(S)) 


where A*(S, 77 *(S)) is the local minimum of the poten¬ 
tial function $ defined as in GD- but on the domain 
[r]*(S), l/o 2 ] s . We can prove that this rf{S) is unique, that it 
satisfies Theorem [5]-(if) and we conjecture that this definition 
gives the largest possible parameter satisfying Theorem [SJ-C h). 

The result of Theorem 0 allows us to establish the main 
result of this section: 

Theorem 6: As in Theorem[4] assume that the privacy costs 
satisfy c , 1 (A) < ••• < c4(A), for each A £ [0, l/c 2 ]. Let 
77 *(TV) be as in Theorem]^//) for S = TV. The analyst can im¬ 
prove the estimation by implementing the game T(N, 77 * (TV)) 
with minimum precision level 77 * (TV), i.e., 

°m(A*(TV, ?7*(TV))) < Om(A*(TV)). 


Theorem [ 6 ] shows that the analyst can improve the precision 
of the estimation of the mean 1 jm simply by setting a minimum 
precision level and soliciting access to the data from all 
the agents in TV. This is true for any minimum precision 
level 77 * (TV) such that Theorem |5|(n) is satisfied and shows 
that, even in the heterogeneous case, it is possible to strictly 


improve the estimation by applying the minimum precision 
level mechanism. Here too, however, we conjecture that the 
parameter 77 * (TV) solving CD yields the highest possible 
improvement. 

C. The Special Heterogeneous Case with Monomial Privacy 
Costs and Linear Estimation Cost 

As for the homogeneous case, we now illustrate the results 
of the previous sections on the heterogeneous model in the 
special case of monomial privacy cost and linear estimation 
cost. In this simplified model, the cost function in Q has the 
form 


«7i(Aj, A_i) — dX'l + <j\[{ A), 


(16) 


with Ci £ (0, 00 ) for each i £ TV and k >2. The assumption 
of Theorem [4] that agents can be ordered s.t. c\ (A) < ... < 
d n { A) for each A £ [0,1/cr 2 ], translates now to requiring that 
0 < Ci < ... < c n (which, in the case of monomial costs, is 
completely without loss of generality). 

Even with such a simplified model, having heterogeneous 
agents does not allow us to write the key quantity in closed 
form as we did in the simplified homogeneous model in 


Section IV-C However, it is still possible to provide clearer 


expressions and to quantify the variance improvement by 
setting a minimum precision level. 

When the agents play the estimation game T, at equilibrium 
they choose a precision level that, if interior, can be written 
as 

/ 

A* (IV) = 


1 


y ( J2jeN A j (N)j 


The analyst can improve the estimation by setting a minimum 
precision level 77 *(TV). In this simplified case, it takes the form 


77 * (TV) = 

[ _i_r. 

V c -(E j6 jv A *(v*(N))) ( EjeN\{n } A*(t?*(TV))) ) 

Note that the two expressions above are in the form of fixed- 
point equations. It is interesting to note that when k > c n /c\ 
though, i.e., when the privacy cost of the agents are not too 
dispersed, this minimum precision level can be written in 
closed form as 


77 * (TV) 


1 

c n n(n - 1 ) 


1 

fc + 1 


(17) 


It is then equal to the optimal precision level, when all 
the agents have the same privacy cost as the most privacy- 
concerned individual. 

Figure [2] illustrates on an example the estimation improve¬ 
ment in the heterogeneous case when choosing 77 * (TV) as 
above (which we conjectured is the optimal choice). We com¬ 
pare it with the improvement in the analogous homogeneous 
case when choosing the optimal 77 * (n) (see Theorem [5] which 
does not depend on c). 













k 

Fig. 2: Improvement of the estimation in [’(//) in the hetero¬ 
geneous case choosing the optimum precision level rj*(N), 
compared to the homogeneous case choosing the optimum 
precision level ty*(n); for values of k = 2,..., 20. In this 
example, c = (1,1.5,2,2.5, 3), 1 /<t 2 = 2. 


VI. Extensions of the Model 


In this section, we extend our model in two directions. 
In Section |VI-A[ we propose an alternative modified esti¬ 
mation game, and we compare it with the one proposed in 


Section III-D The main difference with the previous one is 


that it is a two-stage game. In Section VI-B we add an 


important variable to our model by introducing a per-agent 
cost of collecting data. Both proposed extensions are included 
to derive qualitative insights about the practical applicability 
of the model, however, we defer an in-depth analysis to future 
work. 


A. The Modified Two-Stage Game 

In r(N,rf), both the decision to authorize the access (or 
to deny it) and the selection of a precision level (in case of 
authorization) are simultaneous. This variant captures cases 
in a realistic fashion where the analyst requests access to 
data already present in a repository. In different applications, 
however, the analyst may first recruit participants that commit 
to provide private data with a minimum precision; and only in 
a second stage (for example, as soon as the data becomes 
available), these agents would be asked to disclose their 
information. This scenario applies, for example, to medical 
research studies or consumer decisions, and it motivates the 
study of a model where agents first decide to participate or not, 
and only then decide on the precision of the data released. 
Another motivation to study such a model is that it could 
lead to a higher estimation accuracy, in which case the analyst 
would want to implement it even if it does not naturally arise 
from the application at stake. 

In this section, we investigate this extension of our original 
model in the simplified case with homogeneous monomial 
privacy costs and linear estimation cost as it is sufficient to 
understand and illustrate the qualitative differences between 
the two models. We leave the development of the more general 
model to future work. We also point out the possibility, for 
future work, of a similar extension, in which the agents 


asynchronously make decisions on whether or not to share 
their data (i.e., they make their sharing decisions based on 
actions taken by agents who were contacted earlier by the 
analyst). However, in absence of observability of the contri¬ 
bution decisions (as it is often the case in the medical domain 
due to confidentiality restrictions) even asynchronous decision¬ 
making can be approximated well with a simultaneous move 
model. 

To investigate our variant of the model and to compare 
its outcome with the one of the game T(N,r]), we define 
a two-stage variant of the game. We assume that the agents 
are initially informed of the minimum precision level 77. In a 
first stage, they have to decide if they want to deny access 
to their data, and exit the game, or if they wish to accept to 
authorize access. The set of agents who accepted to participate 
is revealed to all agents. In a second stage, the agents who 
decided to participate choose their precision in the imposed 
range [77, 1 /cr 2 ]. Formally, this situation is modeled through 
the following two-stage game r 2 (?7): 

(i) In the first stage , the agents make a binary choice pi £ 
{ 0 , 1 } 

{ 0 if i denies the access 
1 if % accepts to authorize. 

We denote by p £ {0,1}" a strategy profile, P = {i £ 
N : pi = 1} the set of agents who accept, and p = \P\ 
its cardinality. 

(ii) In the second stage, given p £ {0,1}", the agents play 
a game r p (?7) = (N, [77, 1/ct 2 ] p x {0}"~ p , ( Ji) ieN ), 
where each agent i £ P has strategy space [77, 1 /ct 2 ] 
whereas each agent i £ N \ P has strategy space {0} 
(i.e., the agents in N\P can only choose \ = 0, which, 
we reiterate, is equivalent to no participation). 

We have already seen that the analyst can improve the esti¬ 
mation by modifying the original game, and that the optimal 
choice, in that previous setting (in the homogeneous case), 
is to implement the game T(N,rf(n)). We now investigate 
whether the analyst can improve the estimation even more, 
while implementing the game r 2 (?7) for an optimal choice of 
the minimum precision level 77. 

The games r(iV, 77) and T 2 }??) differ in the information 
available to agents when choosing their precision (observe for 
instance that r(iV, 0) = F, while T 2 (0) 7^ T). In r(iV,77), 
both the decision to authorize the access or to deny it and 
the decision of the precision (in case of authorization) are 
simultaneous. In contrast, in r 2 (?7), the set of agents who will 
authorize with precision of at least 77 is known at the time of 
choosing the exact precision. 

As we did for the previous games, we study F 2 (?7) as 
a complete information game between the agents, i.e., we 
assume that the set of agents, the action sets (in particular, 
when present, the value of the parameter 77) and the costs are 
known by all the agents. 

We study the pure strategy Nash equilibria of the game 
using backward induction. Given p £ {0,1}" the outcome 











of the first stage, a Nash equilibrium for the second stage 
is a strategy profile A* £ [ 77 , l/cr 2 ] p x {0}” -p s.t., for each 
i £ N\P, A* =0, while for each i £ P, X* is s.t. 

A* G argmin J z ( A*, A**). (18) 

If for each p £ {0,1}" the second stage game has a unique 
solution A* (p, 77 ) (as we will see, it is the case in our model), 
the choice that the agents make in the first stage determines 
univocally the outcome of the two-stage game. Then, r 2 (r;) 
reduces to a one-stage game (TV, {0, 1 }”, (,// )iew), where the 
cost function J\ : {0,1}” —> K, for each p £ {0,1}, is given 
for all i £ N by 

JHp) = Ji(A*(p,7?)) = c\*(p,ri) k + cr 2 M (\*{p,ri)). (19) 


Then, an equilibrium of the game T 2 (77) is a strategy profile 

p*e { 0 , 1 } s.t. 

p* £ arg min j\ (p t , p*_ i ), \/i £ N. ( 20 ) 

Pi6{0,l} 


We apply backward induction, by starting to analyze the 
second stage game. In the following lemma, we show the 
existence and the uniqueness of a Nash equilibrium for the 
game T p (p) when p ^ ( 0 ,..., 0 ). 

Lemma 1: For each p £ {0, l} ra \ {(0,..., 0)}, the 

game T p (r]) has a unique Nash equilibrium A*(p, 77), s.t., 
KiPiV) = A*(p, 77) for each i £ P, with 




A* (p) if 0 < 77 < A* (p) 

77 if A*(p) < 77 < 1 /cr 2 , 


( 21 ) 


(where A* (p) is defined as in ( fT0| >) and A* (p, 77) = 0, for each 
i £ N\P. 

We observe again that the equilibrium of the game restricted 
to agents in P is symmetric (i.e., each participating agent 
chooses the same precision level at equilibrium). We call 
A* (p, 77) the equilibrium precision for agents in P to empha¬ 
size the dependence on the cardinality p of the set P and on the 
parameter 77. In fact, due to the symmetry, the optimal choice 
for an agent who decided to participate depends only on the 
number of agents who made the same choice as her in the first 
stage and not on the identity of these agents. Further, given P 
and 77, the equilibrium of T p {t]) is the same as the equilibrium 
of r(TV, 77) given in Theorem | 2 j when replacing n by p. The 
only difference is that, even for large 77 the agents in P will 
choose precision level 77 in T p {t]) since they are committed 
to participate with precision of at least 77. Consequently, the 
equilibrium of F p ( 77 ) always exists and it is s.t. each agent 
choosing a non-zero precision level. 

As the second stage game has always a unique solution, we 
can apply backward induction, and the two-stage game r 2 ( 77 ) 
reduces to a one-stage game. The following lemma establishes 
the existence and uniqueness of its equilibrium for a minimum 
precision level. 

Lemma 2 : For any rj £ [A*(n — 1 ), 77*(tz)], the two- 

stage game r 2 (? 7 ) has a unique equilibrium given by p* = 

(1.IS- 


Lemma [2] states that, if the analyst chooses a minimum pre¬ 
cision level in the range [A*(?i— 1), rf{n )] and implements the 
two-stage game T 2 (? 7 ), then each agent will participate at equi¬ 
librium. The equilibrium contributions, given by Lemma [T] 
equal 77 for each agent since 77 > A *(n — 1 ) > A*(n). For 
77 in the range [A*(n — 1), 77 *(71)], the outcome of the two- 
stage game r 2 (? 7 ) is therefore the same as for the one-stage 
game T(TV, 77). This is not the case, however, for other ranges 
of parameters. In particular, for 77 < A *{n — 1), all agents 
participate in T(TV, 77 ) whereas they may not participate in 
r 2 ( 77 ). This is because, in T 2 (t 7 ), agents react in the second 
stage to the participation decisions of the first stage (typically 
if an agent chooses not to participate, the others increase their 
precisions in the second stage). As a consequence, agents 
can strategically choose their participation in the first stage to 
influence the precisions chosen in the second stage. Analysis 
of the existence and uniqueness of the Nash equilibrium in 
r 2 (77) in ranges of 77 outside [A*(n — 1), 77 * (tt.)] is therefore 
more intricate. Nevertheless, we can establish our main result, 
namely that choosing 77 = 77 * (n) yields an optimal estimation 
variance for the analyst: 

Theorem 7 : For the game T 2 (? 7 ), with 77 £ [0,1/cr 2 ], the es¬ 
timate’s variance at equilibrium ct 2 m (X* (p*, 77)) is minimized 
for 77 = 77* (n). The improvement obtained by setting the 
minimum precision level rj = is characterized, for n 

large enough, by the ratio 


») 

<4(A*(p*,77*(n))) 


kn 
n — 1 


> 1 , (k > 2 ). 


Theorem [7] shows that the optimal 77 is the same for the 
one-stage game F(TV, 77) and the two-stage game r 2 (?7), and 
both yield the same improvement for the analyst. As such. 


the discussion given in Section IV-B about the asymptotic 
behavior of this gain still holds. However, as mentioned, the 
two games T(t 7 ) and r 2 (? 7 ) are not equivalent for each choice 
of the parameter 77. In particular, we can infer from the proof 
of Theorem [7] that there is still a range of minimum precision 
levels for which the estimation is strictly improved, but this 
range is smaller than it was for r(JV, 77). 


B. The Estimation Game in the Presence of Per-Agent Costs 

In this section, we propose an extension of our model to 
include the cost of collecting data. Indeed, in Section III and 
throughout this paper, we assumed that data is collected at no 
cost, and that the analyst aims at minimizing the variance of 
the mean estimation. The absence of per-agent cost (to solicit 
contributions) is a standard assumption in most of the public 
good literature. However, it could limit the appeal of our model 
in some applications. Here, we present preliminary results 
with arbitrary per-agent cost, restricted to the homogeneous 
case. We then introduce a simplified case with linear per- 
agent cost, to illustrate the qualitative difference to the zero 
per-agent cost case, in particular, the existence of an optimal 
number of agents n. The derivation of the optimal n would 
be slightly different when assuming, for example, a concave 






cost function. This is left as a possible future work suggestion 
(see Section m- 

When facing a per-agent cost, we can no longer rely on 
the fact that the analyst will always prefer to have the largest 
possible set of agents. Rather, she has to select the optimal 
subset of agents to include in the game. In the homogeneous 
case, selecting the optimal subset of agents reduces to selecting 
the optimal number of agents n* £ N. To address this 
problem, we assume that, instead of aiming at minimizing 
the variance, the analyst aims at minimizing a cost function 
JA '■ N* —* JR. defined as 

Ja(u) = f(r)*{n)) + Cn , ( 22 ) 

where / is the estimation cost defined in Section [HI] while C 
represents the per-agent cost of collecting personal data. We 
assume that the estimation cost is evaluated at equilibrium, 
when the analyst chooses the optimal minimum precision 
level. In fact, for a fixed n, T)*(n) provides the minimum 
variance and, consequently, the minimum estimation cost. The 
problem of the analyst now reduces to setting an optimal 
number of agents n*. 

Theorem 8: The function ,J a (n) has a minimum in N*. The 
optimal n* is given by n* = ma x{m € N*\c(rj*(m )) > C}, 
if this set is non-empty, and by 1 otherwise. 

Theorem [8] shows how the analyst can optimize the balance 
between the minimization of the estimation cost and the per- 
agent recruitment cost. In this situation, it is typically not 
optimal anymore to contact as many agents as possible. Of 
course, if the theoretically determined optimal number of 
agents equals or exceeds the size of the potential participant 
pool (n* > n), then the analyst will contact all available 
agents. As c{rf{m)) is non-increasing in to, n* can be easily 
computed by the analyst, for example by implementing a 
bisection method on [l,n], where n is the total number of 
agents whose data is contained in the repository. 

VII. Concluding remarks 

In this paper, we investigate the problem of estimating 
population averages from data provided by privacy-sensitive 
users. We assume that users can perturb their data before 
revealing it (e.g., by adding zero-mean noise) in order to 
protect their privacy. Users, however, benefit from a more 
accurate population estimate. Therefore, each user strategically 
selects the precision of her revealed data to balance her 
privacy cost and the cost incurred by a lower precision of 
the population estimate. We find that the resulting game has 
a unique Nash equilibrium and carefully study its properties. 

We further prove that the analyst can increase the population 
estimate’s accuracy simply by imposing a minimum precision 
level for the data which users can reveal (e.g., by restricting 
the variance of the noise users can add). The surprising and 
important aspect of this result is that the scheme remains 
incentive-compatible, i.e., users are willing to provide data 
with a higher precision rather than dropping out. We also show 
how to tune the minimum precision level the analyst should 
set in order to optimize the population estimate’s accuracy. In 


our numerical simulations, the maximum improvement of the 
population estimate’s accuracy is in the order of 20 — 40%. 

Our model treats the population estimate’s accuracy as a 
public good (e.g., if one agent increases the precision of 
the data she gives, it benefits all users). Then, our results 
offer a novel method to increase the provision of a public 
good above voluntary contributions, simply by restricting the 
agents’ strategy spaces. This method is attractive through its 
simplicity compared for instance to other schemes that involve 
monetary transfers, and could find application in other public 
good problem domains. 

The results are derived for arbitrary functions for the privacy 
cost experienced by each user and for the estimation cost 
(satisfying relatively mild assumptions). This increases the 
robustness of our main results and allows for application to 
various situations. Further, we study the cases of homogeneous 
and heterogeneous agents. Indeed, for practical utilization of 
our work it is important to be able to accommodate different 
types of privacy preferences as evidenced by the literature 
on the value of privacy (which includes direct measurement 
surveys 0’ (53) and laboratory/field experiments (58), (59)). 

We also consider extensions of our basic model such as 
variations in the structure of decision-making. Introducing 
a two-stage structure impacts the available information to 
individuals, i.e., whether or not the set of contributing agents 
is determined before agents choose their precision levels. 
Surprisingly, we find that providing this information to users 
can never improve the estimation’s accuracy. In future work, 
we plan to analyze other decision-making structures, such as 
when agents make decisions asynchronously and can utilize 
information about the previous contribution levels by other 
agents. 

In our basic model, we assume that the analyst can collect 
data from n users at negligible cost. This assumption can be 
reasonable in scenarios where the data is already available 
in a repository, and the analyst merely has to inquire with 
individuals to contribute their data for secondary analysis. 
In this scenario, we showed that the population estimate’s 
accuracy increases with n (although each individual then 
lowers the precision of her contribution). We further extend 
the model to handle applications where there could be a more 
substantial cost of collecting data per user (e.g., cost of sending 
a survey). In that case, it is no longer optimal for the analyst to 
collect data from all users but we show, in the homogeneous 
case, how the analyst can then select the optimal number of 
users. The method outlined for the homogeneous case also 
provides a trajectory to approach the task of selecting the 
optimal set of agents to solicit data from in the heterogeneous 
case, utilizing the ordering assumption of Theorem [4] Further, 
our results regarding the benefits of a minimum precision level 
apply also to costly data acquisition. In future work, we will 
consider non-linear cost (e.g., concave) to further generalize 
our results. 

A unique Nash equilibrium exists for all considered cases 
and extensions. Computing the exact equilibrium strategies 
may be non-trivial for agents in practice. However, knowledge 


about the uniqueness of the optimal strategies suggests the pos¬ 
sibility of reaching the equilibrium via tacit coordination when 
agents gain experience with comparable data contribution 
decisions [ |66| . In addition, providing a minimum precision 
level will further guide agents in their decision-making. 

In this paper, we consider the problem of estimating the 
population average of a single scalar quantity. However, the 
results also serve as building blocks to tackle more complex 
scenarios. For example, an analyst may need to estimate 
averages of several quantities which are not independent 
(if the quantities are independent, our results readily apply 
by considering several independent instances of the model, 
possibly with different privacy costs). Further, the analyst may 
want to estimate the parameter of a linear model as in |jl8j. In 
both cases, the problem of selecting the users to solicit data 
from will become combinatorial and requires further study 
to find a suitable approximation. However, our techniques to 
characterize the equilibrium of the modified game will extend 
and will be instrumental in establishing the optimal strategy 
space to impose for a given set of users. 
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Appendix 

A. Corollary 1 from “Comparing Equilibria” of Milgrom and 
Roberts 

Many of our theoretical contributions rely on a result of the 
paper from Milgrom and Roberts, “Comparing Equilibria”. To 
help the reader, we present here this result. For simplicity, we 
replace the hypothesis of “continuous but for upward jumps”, 
with the stronger hypothesis of “continuous”, which is verified 
by our functions to which we apply the theorem. We also adapt 
the statement of the corollary to a fixed point problem defined 
on a generic interval [a, b] C R+. 

Corollary 1 (Milgrom and Roberts): Let g(x,t) : [a, b] x 
T —> [a, 6], where T is any partially ordered set. Suppose 
that for all t G T, g is continuous in x. Then Xl{C) = 
inf{x| g{x,t) < x} and xn(t) = sup{.T|g(:E, t) > x} are the 
extreme fixed points of g , that is, the lowest and the highest 
solutions of the equation g(x,t) = x. If, in addition, g is 
monotone non-decreasing in t for all x G [0,1], then the 
functions Xl{-) and .:;://(■) are monotone non-decreasing, and 
if g is strictly increasing in t, then these functions are strictly 
increasing. 

B. Proof of Theorem [7] 

T is a symmetric potential game, with potential function 
$ : [0, 1 /ct 2 ]" —> M, s.t., for each A G [0,1/er 2 ]” 

$(A) = ]T c(Xj) + /(A). (23) 

j£N 

By the definition of potential game, the set of Nash equi¬ 
libria of T is contained in the set of local minima of func¬ 
tion <1>. Then, as function $ has a unique local minimum 
A* G [0,1 /<J 2 } n , it has to coincide with the unique Nash 
equilibrium of T. In particular, the optimum A* is such that 


A* / (0,..., 0) and for each i £ N, A* satisfies the following 
KKT conditions 


( _1- F' 

[ ip*x* = o 4>*(\* - 




+ c'(A*)-0*+0*=O 


l/a 2 ) = 0 , 0 *, 0 *> 0 - 


(24) 


Observe that, as a consequence of the assumption that c'(0) = 
0, A* > 0 for each i £ N. In fact, if we suppose there exists 
i £ N s.t. A* = 0, the /th-cquation of the KKT conditions 
cannot be satisfied, as 


(E, 


jeN n j 


A*) 2 


F' 


E 


jeN E _ 


- Va* < o, 


because ■ 0 * > 0 and F' > 0 as F is strictly convex. Moreover, 
as <I> is a symmetric function on a symmetric domain, the only 
minimum is symmetric, i.e.. A* = A* for each i £ N. 


C. Proof of Proposition [7] 

From ( |24| , A* is the unique solution of the following fixed 
point problem 


A = g(n , A), 


(25) 


where function g : N+ x [0,1/cr 2 ] —> [0, +oo] is defined for 
each A £ (0,1/cr 2 ] and for each n £ N+ as 


( 26 ) 

and by continuity as lim A _>, 0 + g(n, A) in A = 0 for each n £ 

N+. 

We consider this fixed point problem, but with the parameter 
n defined on the real interval [1, +oo]. For each n £ [1, +oo], 
g is continuous in A. Function g is monotonic non-increasing 
in n. In fact. 


Og 

dn 


1 


__ 1 _ F" ( — ^ 

n 4 A 2 c'(A) \n\ J 

< 0 


2nc / (A) , ( J_\ 

n A d{ A ) 2 \nX J 


Applying Corollary 1 of Milgrom and Roberts [67] , recalled 
in Appendix [Aj (with parameter t = — n) the unique fixed 
point X*(n) is non-increasing in n, and this proves (i). 

To prove (ii), we observe that 

lim g(n, A) = 0 (zero function), 

n—>-+oo 

and the unique fixed point of the zero function is 0 . 


D. Proof of Proposition [2] 

If A *(n) = 1/cr 2 or A*(n+1) = 1/cr 2 , (i) is trivial. Suppose 
that both A*(n) / 1 /ct 2 and A *(n + 1) / 1/cr 2 . Moreover, 
suppose by contradiction that 


< 


nX*{n) (n+ l)A*(n+ 1)’ 


It follows that 
F' 


nX* ( n) 


< F' 


1 


(n + l)A*(n + 1) 


(27) 


(28) 


because of the strict convexity of F. Moreover, as A *(n) > 
A*(n+ 1) (by Corollary [T]», it follows that 


c'(A*(n)) > c'(A*(n + 1)) 


(29) 


because of the strict convexity of c. From ( |28| and ( |29] >, it 
follows that 
1 

nX*(n) 

1 

( _I_/ __I_ 

ynX* (n) J n 2 c'(X*(n)) 

1 

+ ]j F ' ((n+l)A*Kl)) Ft+l) 2 c'(A*(n-FI)y 

1 

(n + l)A*(n + 1) ’ 

which contradicts {27} and then proves (i). 

To prove (ii), observe that, because of Proposition [ljf'n) and 
because of Assumption |T] 



lim c'(A*(n)) = 0. 

n —>-+oo 


If, by contradiction, 


1 


then 


lim . . 

n-r+cxo nX*(n) 

1 


> 0 , 


(30) 


lim F' 

n—>-\-oo 


>o, 


^nX*[n) J 

because of the strict convexity of F, and consequently 

1 


lim 

n-f-\- oo 


= o, 


F (nA*(n)) n 2 c'(\*(n )) 

which contradicts ([30]) and then proves (ii) 


E. Proof of Theorem [2] 

First, observe that for each S C TV, with s > 1, and for each 

£ [0, 1 /cr 2 ], the game r(S, 77) is still a potential game, with 
potential function $ as in ( |23| ), but defined on the domain 
[{0} U [ 77 ,1/cr 2 ]] / The set of Nash equilibria of T(S, 77 ) is 
included in the set of the local minima of $ on this new 
domain. 

When s = 1, the potential function and the cost function 
of the only agent coincide. Then, a strategy profile is a Nash 




































equilibrium if and only if it is a global minimum of <I>. If 
V < A*(l), then the only global minimum of $ is still A*(l). 
If 77 > A*(l), then the only global minimum is 77 . 

Now, let s > 1. We define the function g : N+ x [0,1/cr 2 ] —► 
[ 0 , +oo] s.t., for each rj £ ( 0 , 1 /cr 2 ] and for each n £ N+ 



and we extend it by continuity in 77 = 0 for each n £ N_ . We 
consider the following fixed point problem 

V = g(s,r j), (32) 


and we show that this fixed point problem has a unique 
solution. To prove that, we show first that equation 

in the 77 variable, has at most one solution in [0,1/cr 2 ]. This 
can be seen by noticing that, for each s > 1 , the difference 
~ Jr] * s d ecreas i n g in V- Moreover, function F is strictly 
convex and non-increasing in 77 , and this implies that the 
difference 


F 


1 


(s - 1)77 



(34) 


is a decreasing function of 77. As c is a non-decreasing function 
of 77, it follows that ( [ 33 | has at most one solution in the given 
interval. 

The fixed point of ( [ 32 ] ) is given by this solution (if it exists), 
or by 1 /cr 2 otherwise, and then it is unique. We denote this 
unique fixed point by 77* (s). 

Looking for the Nash equilibria when s > 1 , at first we 
focus on the ones which are in [77, 1 /<t 2 ] s . In particular, we 
can distinguish the three following subcases (observe that, in 
case we have that A*(s) > 77* (s), this simply implies that the 
subcase (iib) will never occur): 

(ia) When 77 £ [ 0 , A*(s)], as A*(s) is the unique local mini¬ 
mum of the potential function on the domain [0,1 /cr 2 ] s 
and as A*(s) £ [77, l/cr 2 ] s , then, because of the convexity, 
it is still the only local minimum of the potential function 
on [77, l/cr 2 ] s . In particular, it is a Nash equilibrium of 
T(S, 77). In fact, if there exists a deviation of agent i £ S 
for the game T( 3 ', 77) which makes her cost function 
smaller, it would be a feasible deviation which makes 
her function smaller also for the game T(S', 0), and this 
would contradict the fact that A*(s) is a Nash equilibrium 
of T( 5 ,0). 

(ib) When 77 £ (A*(s),77*(s)], the vector 77 = (r])i e s is 

the only local minimum of the potential function on 
[77, l/cr 2 ] s . In particular, it is a Nash equilibrium. In fact, 
because of the strictly convexity of the potential function, 
any deviation of agent i £ N to a precision level in 
(77, 1 /cr 2 ] would make her cost function bigger. Moreover, 
if agent i £ N deviates to 0 , her cost function cannot 


become smaller. In fact, we have that, from ( | 33 | ), when 

V < V*(s), 

The term on the left represents the cost of agent i 
deviating to zero, and the term on the right denotes the 
cost when she decides to keep on choosing a precision 
equal to 77. 

(ii) When 77 £ (77*(s), 1 /cr 2 ], the only local minimum in 
[77 ,1 /cr 2 ] s is 77. But this is not a Nash equilibrium. In 
fact, still because of ( | 33 | ), when 77 > 77* (s) 

and this means that an agent can make her cost function 
smaller deviating to zero. 

We can now remark that, as A*(s) is a Nash equilibrium for 
T( 5 , 0 ), it implies that, by playing that strategy, the agents 
do not have incentives to deviate to zero. As 77* (s) is the 
maximum minimum precision level s.t., if the agents are 
playing T(S I ,?7*(s)), they do not have incentives to deviate 
by 77* (s), it follows that A*(s) < 77* (s). 

We proved that when 77 £ (77*(s), 1 /cr 2 ], there does not exist 
a Nash equilibrium of T( 5 ,77) in [77 ,1 /cr 2 ] s , and this proves 
Theorem j 2 }(//). We have also proved that when 77 £ [ 0 ,77*(s)], 
there exists a unique Nash equilibrium of T( 5 ,77) in [77, l/cr 2 ] s . 
In order to prove that there do not exist other equilibria with 
a zero component (and then, in order to prove Theorem | 2 j-( 7 )), 
we first state the following lemma. 

Lemma 3: Suppose that X! = (A^,..., A' s ) is a local mini¬ 
mum of the potential function >1> on [{0} U [ 77 ,1/cr 2 ]] , with 
77 £ [0,1/cr 2 ] and call T = {i £ S : A' = 0}, with t = |T|. 
Then, A' is a local minimum on {0 } 4 x [ 77 , l/cr 2 ] s_t and it is 
s.t. A- = A' for each i £ S\T, with 

y = / A *( S-< ) if 0 < 77 < A*(s-f) 

[ 77 if A*(s — t) < 77 < 1/cr 2 . 


Suppose now that there exists a local minimum X! s.t. A' = 
0 for at least one i £ S and call T = {i £ S : A' = 0}, with 
t = \T\ > 1, the set of agents who are at a zero precision 
level. Then, because of Lemma [ 3 ] A' = A' for each i £ S\T 
and it is given by ( |35] l. We show that this cannot be a Nash 
equilibrium. In fact, we have that, 

< F ( - 1 -^ - F ( - 1 - 

\{s-t)X'J \{s-t+ 1 )A' 

when t > 1. The first inequality follows from ( |33j ) and from 
the fact that A' £ [0, 77 * (s)]. The second inequality follows 
from the fact that, fixed 77, the difference in ( f34| is a decreasing 
function also of s. From ([36]), it follows that if an agent in S\T 
deviates moving from the precision level 77 to the precision 
level A', she can strictly decrease her cost function. 















This proves that any local minimum of <1> s.t. at least 
one agent chooses a zero precision level, cannot be a Nash 
equilibrium. Then, when s > 1, the equilibrium is unique and 
it is given by (| 8 j, with 77 * (s) unique solution of ( |32[ >. 

F. Proof of Theorem [J] 

We have already seen in the proof of Theorem [2] that for 
each S C TV, A*(s) < rf(s). It follows that < 

Om(A*(s)). This means that, fixed the number of agents s, 
it is optimal, for the analyst, to choose a minimum precision 
level equal to 77 * (s). 

Step 1: First, we show now that, if A*(s) 1/cr 2 , this 

inequality is strict, meaning that A*(s) < 77 * (s) and the analyst 
can strictly improve the estimation, by choosing 77 * (s) instead 
of 0 as minimum precision level. In fact, if A*(s) = 77 *(s), it 
follows that A*(s) is s.t. 

C(A * (S)) = F ((«-l)A*(s)) " F (sA*(s)) ' 

But then, at equilibrium, the potential function $ is s.t. 

+ac(A ' (a)) 

= F (^w) +< “- 1WA ' (a)) 

+ F ((s- 1 )A’(s)) f (sA*(s)) 

= F ((I^W(Jj) +<s - 1,c(A ‘ (s)) - 

This implies that the potential function is minimal for an 
agent i choosing A* (S') = 0, and this contradicts the fact 
that the equilibrium of $ is unique, symmetric and s.t. A* f 
(0,..., 0). It follows that, for each S C N, A*(s) < 77 *(s). 

Step 2: Second, we observe that 77 * (s) is nonincreasing in 
s. This because 77 *(s) is the unique fixed point of the problem 
in ( [32] ), and the function g(s,i 7 ) is continuous, nondecreasing 
in rj and nonincreasing in s. Then, applying Corollary 1 of 
Milgrom and Roberts [|67| , recalled in Appendix [A] to it (with 
parameter t = —s), we have the result. 

Step 3: Finally, we show that cr 2 M (r)* (s)) is nonincreasing 
in s, and then, that it is optimal, for the analyst, to collect 
data from the largest possible number of agents. We suppose, 
by contradiction, that there exists k £ N+ s.t. cr| f ( 77 *(/c)) < 
°m(t 7 *(^ + 1 )), or equivalently s.t. k-rj*(k ) > (k + l)-r]*(k + 
1). We have shown in step 2 that 77 *(s) is nonincreasing in s, 
then rf(k) > r)*(k+ 1). Suppose r]*{k) 7 ^ 1/cr 2 (otherwise, 
the result is trivial). By definition, 77 * (fc) is the solution of ( |32| ) 
for s = k and 77 * (A: + 1) for s = k + 1. We write the equation 
in ( |32| ) as 

,37) 

where t\ = k ■ r]*(k). Similarly, we write 


where t 0 = (k + 1) • r/*(k + 1). Because of the hypothesis 
by contradiction, to <t\\ moreover the difference on the left 
in ( |37} is increasing as a function of the parameter. We may 
apply a straightforward adaptation of Milgrom and Roberts’ 
Corollary 1 ]67), recalled in Appendix |A| (on the right we do 
not have a linear function of ?y as in the original statement, 
but a strictly increasing function of rj) and we obtain that 
77 * (A;) < 77 *(k + 1), contradicting what we have shown in 
Step 2. 


We have shown that for the analyst it is not optimal to 
implement the game with only a subset of the agents. More¬ 
over, for the analyst it is not optimal to choose a minimum 
precision level 77 > rf(n). In fact, in that case, as we have 
seen in Section |IV-B[ if there exists an equilibrium, it is an 
equilibrium s.t. only a subset of agents choose a non-zero 
precision level, and this leads back to the previous case. 


G. Proof of Theorem [7] 

The proof follows the proof of Theorem |T] In particular, the 
unique Nash equilibrium satisfies the KKT conditions in ( |24| ) 
(with heterogenous privacy costs), from which it still follows 
that, because of the assumption that c'(0) = 0 for each i £ N, 
A* 7 ^ 0 for each i £ N. Given i £ N, the corresponding 
equilibrium precision level is s.t. 


_ F'(a 2 M ( A*)) 

(E^a*) 2 ’ 


(38) 


if the solution is smaller than or equal to 1/cr 2 , or by 1/cr 2 
otherwise. 

As the right term is the same for each i £ N, it immediately 
follows that, if the cf s are s.t. c^A) < ... < c^(A), for each 
A e [0,1/cr 2 ], then A* < ... < A*. 


H. Proof of Proposition [77] 

From Equation ( |38| ). as soon as agent 71 +1 enters the game, 
fixing the strategies of the other agents, the term on the right 
decreases. In order to balance the equality at best response, 
and because of the convexity of the privacy cost c t , fixing the 
strategy of the other agents, each agent i £ N will choose 
a precision level which is smaller then the precision level 
at best response, without agent n + 1. As a consequence, at 
equilibrium, A *(N U{?t. + 1})< A *(N) for each i £ N. 


I. Proof of Proposition [7] 

We write Equation ([38]) as 


c'(A*) 


<&( A*)- 


We suppose by contradiction that a 2 _ I (\*(N U {n + 
1})) > a 2 M (X*(N)). Then, F'(a 2 M (\*(N U {n + 1}))) > 
F'^m (A* (N))), because of the convexity of F. Moreover, 
from Proposition [ 3 ] we know that A* (AT U{?i + 1}) < A*(IV), 














and then c'(X*(N U {n + 1})) < c'(A*(IV)) because of the 
convexity of the e,s. It follows that 

*m(A*(IV U {n + 1})) 

ci(AJ(iVU{n+l}) 
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which translates the fact that agent s does not have incentives 
to deviate to zero. 

Step 1: First, we show that 


c n (X* n (S)) < F 


1 


A j( S ) J 

By contradiction, if 

c n (X* n (S)) = F ' 


- F 
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K J2j£N A j(S) t 

, YsjzNjjtn A j ( s ) J V Sjeiv A j (S) / 

then at equilibrium the potential function $ is s.t. 
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This implies that the potential function is minimal for A* (S') = 
0 , and this contradicts the fact that the equilibrium is unique 
and s.t. no agent is playing zero. 


Step 2: Now, let 77 * (S) be s.t. 

c n (X* n (S, V *(S))) 


( 39 ) 


= F 


E,- 


ieJV.j'T^n / \7 


A *AS,v*(S)) 


- F 


EjeAt A^|(S, 7 j*(S)) y 


and this contradicts the supposition by contradiction. 

J. Proof of Theorem [5] 

At first, we recall that we denote by A* (S) the unique Nash 
equilibrium of the game r(S, 0). Then, for each S C TV, with 
s > 1 , and for each ?/ £ [0, 1 /cr 2 ] , we observe that the game 
T(S, 77) is still a potential game, with potential function $ as 
in ( |23] l, but defined on the domain [{0} U [ 77 ,1 /cr 2 ]] 6 . The set 
of Nash equilibria of T(S, 77) is included in the set of the local 
minima of $ on this new domain. 

When s = 1 , the potential function and the cost function 
of the only agent coincide. Then, a strategy profile is a Nash 
equilibrium if and only if it is a global minimum of <t>. If 
77 < A*({1}), then the only global minimum of $ is still 
A*({!})- If 77 > A*({1}), then the only global minimum is 77. 

Now, let s > 1. By definition of Nash equilibrium, the 
unique NE A* (S') is s.t. 


where A*(S, 77 *(S)) is the local minimum of on 
[ 77 *(S'), l/cr 2 ] s . Note that this 7 ?*(S) is unique (as usual, 
because the difference of the F ’s is a decreasing function and 
the c is increasing). We show that 7 ?*(S) > A*(S). In fact, if 
77 *(S) < A * n (S), then A*(S,77*(S)) = A*(S) for each j £ N, 
and we have shown in Step 1, that the equality in ( |39| ) does 
not old for A*(S). 

Step 3: We just need to show that this is a Nash equilibrium 
of T(S, 77 * (S)). At first, observe that no agent has incentives to 
deviate to a quantity in ( 77 * (S'), 1 /cr 2 ], because of the convexity 
of >I>. It remains to be shown that no agent has incentives to 
deviate to zero. Agent s does not have incentives by (39} . 
For any other agent i 7 ^ n, s.t. X*(S,rj*(S)) = X*(S,rj*(S)), 
if agent s, who is the most privacy concerned, does not have 
incentives to deviate from A* (S, 77 * (S)), that is still valid for i. 
For any other agent i 7 ^ n, s.t. A*(S, 77 * (S)) > A*(S, 77 * (S)), if 
i does not have incentives to deviate to 77 * (S), then, because of 
the convexity of the cost function, she cannot have incentives 
to deviate to 0 . 

K. Proof of Theorem [6] 

At first, because of Proposition |4j for each S C N, 
0 m(A*(S)) > a 2 M (\*{N)). Then, for the analyst it is 
more convenient to have the complete set of agents playing. 
Moreover, from the KKT conditions in Equation (38} , when 
implementing T 

c'(A * {N ))= F '^ X ^ m 
n[ ” ’’ (EjeTV Aji(-W) )) 2 ’ 

as we assumed that A*(iV) 7 ^ 1/cr 2 (otherwise the estima¬ 
tion would have been already optimal). When implementing 
r (N, 77 * ( N )), or we have that A; ( N, X* ( N, 77 * ( N ))) = 1/ct 2 , 
and in this case we have proved our result. In fact, it follows 
that every agent is playing 1 /cr 2 and that the estimation is now 
optimal. If X^(N,X*(N,ri*(N))) 7 ^ 1/cr 2 , then 


c'„(X* n {N,X*{N,ri*(N)))) = 


(Ejejv Aji(-Af, 77 *(AT ))) 2 ’ 


(40) 


As 


X* n (N,X*(N : r,*(N))) > 77 *(N) > X* n (N), 

it follows that 

<( K(N, X*(N, 77*(IV)))) > c’ n (X* n (N)), ( 41 ) 

because of the convexity of c n . Assume by contradiction that 

°m(A*(IV, t?)) > o'M (A* (IV)), 

it follows that 

F 1 (N, 77*(IV)))) F’{al,{X\N))) 

(Eigat A)(IV, ?7*(IV))) 2 - (Eieiv A)(IV, )) 2 ’ 






















and this contradicts <ED- 
L. Proof of Lemma [7] 

F p (rf) is still a potential game, with potential function >!> 
as in but defined on the domain [rj,l/a 2 ] p . The set of 
Nash equilibria of l’ /> (r/) is included in the set of the local 
minima of $ on this new domain. The unique local minimum 
of $ is given by A *(p,p) = A *(p), if A *(p) < r), and by 
A *(p,v) = V otherwise. In both the cases, this is a Nash 
equilibrium, because of the convexity of the potential function 
(any deviation of an agent would make her cost function 
bigger). 


M. Proof of Lemma [2] 

Because of Lemma [I] given a vector p in the first stage, 
in the second stage the agents in P are going to choose a 
precision level as in ( |2T| . 

At first, we observe that (1,..., 1) is a Nash equilibrium 
when ?/ £ [A*(n — 1 ),r]*(n)]. As A *(n) < A *(n— 1) < 77, if 
p = (1 ,..., 1), in the second stage, the agents are going to 
play rj at equilibrium, and if an agent decides to deviate to 
Pi = 0, the remaining n — 1 agents are still going to play 77 
at equilibrium. Then, by deviating to pi = 0, agent i cannot 
make her cost function smaller, as 


1 fc ^ 

-h erf < 

np 


1 

(n — 1)77 


by ( |33| ) as 77 < if (77), where the left term represents her cost 
before deviation, and the right one represents her cost after 
deviation. 

To prove that there are no other Nash equilibria, let 77 £ 
[A*(tz— l), 77 *(n)]. Suppose by contradiction that there exists 
an equilibrium s.t. the set N \ P of agents who choose zero 
in the first stage is non-empty. Then, the agents in P choose 
A* ( 73 , 77 ) at equilibrium in the second stage. Then, if \*{p) < 
\*{p— 1 ) < 77 , then an agent in N\P has incentives to deviate 
choosing p t = 1 . The same happens if 77 <A*(p) <A*(p-l). 
While if A*( 73 ) < 77 < A* (p — 1 ), then the agents in P have 
incentives to deviate choosing pi = 0 . 


N. Proof of Theorem [7] 

At first, we observe that, because of Lemma [2] 
ct m(A*( n,» 7 *(n))) < cr 2 M (X*(n,p)) for each 77 £ [X*(n - 
1), 77* (tj.)). In fact, when 77 £ [X* (n — 1), p* (n)], at the unique 
equilibrium, every agent is choosing to participate in the first 
stage and then she is choosing the same non-zero precision 
level A* (77, 77) and then the estimation has minimum cost when 
this precision level is maximal, i.e. when it is equal to 77*(77). 

When 77 £ [0, A* (77)], then for every vector p £ {0,1}" 
in the first stage, in the second stage the agents in N \ P 
choose a precision level A* (n — p) and estimation cost is 
°m(A*(^ ~ p)) — °m(A*(tj)) because of Corollary [ 2 ] When 
?7 £ [A*(n), A*(n— 1)], then for every vector p £ {0,1}" with 
p < n — 1 we have again, as before, a non-optimal estimation. 


while we show that p = ( 1 ,..., 1 ) is not a Nash equilibrium. 
In fact, 

1 J_ fc 

(77 — 1)A*(t7 — 1 ) < 7777 CTI 

for each k > 2 , and this means that each agent can make her 
cost function smaller by deviating to zero. 

Finally, when 77 £ [p*(n),l/o 2 ], for every vector p £ 
{0,1}" in the first stage, in the second stage the agents in 
N \ P choose a precision level equal to A* (n — p) or equal 
to 77 and, as we have already seen in the proof of Theorem 
[ 3 ] this does not provide a minimum value for the estimation 
cost. 


O. Proof of Theorem [S] 

We prove at first that Ja{ti) is a definitely increasing 
sequence, i.e., that Ja(? 7 ) > ^(tr— 1 ), implies Ja( n+ 1 ) > 
,7(4(77). We have that 


Ja{ti) > Ja (77 - 1 ) 

^ F (^r(^)) +Cn>F ((77-1)77* (77-1)) + _ 

> ((77 — 1 )t 7 *(t 7 — 1 )) (7777* (77) ^ 

As the right term decreases when n increases, and as the left 
term is a constant, it follows that this inequality is definitely 
true while n increases. Looking for the optimal n*, we need 
to find the highest 77 s.t. the previous inequality does not hold, 
i.e., s.t. 


( — 1 — 

\ 7777* (77) 

It is now sufficient to observe that the term on the right 
is equal, by definition of p*(n) to cfq*{n)). The highest n 
for which this inequality holds, is the optimal number of 
agents n*. If this inequality is never satisfied, it means that 
the estimation cost is increasing in 77, and than the optimum 
number of agents is 1 . 


C < F 


(n — 1)77* (77 — 1) 










